Phylogenetic incongruence in an Asiatic species complex of the genus Caryodaphnopsis (Lauraceae)

Background Caryodaphnopsis, a group of tropical trees (ca. 20 spp.) in the family Lauraceae, has an amphi-Pacific disjunct distribution: ten species are distributed in Southeast Asia, while eight species are restricted to tropical rainforests in South America. Previously, phylogenetic analyses using two nuclear markers resolved the relationships among the five species from Latin America. However, the phylogenetic relationships between the species in Asia remain poorly known. Results Here, we first determined the complete mitochondrial genome (mitogenome), plastome, and the nuclear ribosomal cistron (nrDNA) sequences of C. henryi with lengths of 1,168,029 bp, 154,938 bp, and 6495 bp, respectively. We found 2233 repeats and 368 potential SSRs in the mitogenome of C. henryi and 50 homologous DNA fragments between its mitogenome and plastome. Gene synteny analysis revealed a mass of rearrangements in the mitogenomes of Magnolia biondii, Hernandia nymphaeifolia, and C. henryi and only six conserved clustered genes among them. In order to reconstruct relationships for the ten Caryodaphnopsis species in Asia, we created three datasets: one for the mitogenome (coding genes and ten intergenic regions), another for the plastome (whole genome), and the other for the nuclear ribosomal cistron. All of the 22 Caryodaphnopsis individuals were divided into four, five, and six different clades in the phylogenies based on mitogenome, plastome, and nrDNA datasets, respectively. Conclusions The study showed phylogenetic conflicts within and between nuclear and organellar genome data of Caryodaphnopsis species. The sympatric Caryodaphnopsis species in Hekou and Malipo SW China may be related to the incomplete lineage sorting, chloroplast capture, and/or hybridization, which mixed the species as a complex in their evolutionary history. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05050-3.


Background
Trees remain a fundamental component in forest ecosystem stability with around 73,000 species and almost 20% of global plant species diversity [1].There are an estimated 9,000 undiscovered tree species, among which roughly half to two-thirds of all is still waiting to be identified in tropical and subtropical forests [1].These broadleaved tree species often refer to rapid diversification and frequent introgression and compound taxonomic confusion.For instance, studies on Chinese oaks have revealed negative linear relationships between diversification rates and genetic variation, suggesting complex associations between morphological divergence and species diversification [2].Similarly, the Pedicularis siphonantha complex in southwest China has shown rapid diversification, frequent introgression, and cryptic species complexes, highlighting the challenges of species delimitation based on morphological characters [3].The phenotypes and genetic lineages of the tropical and subtropical tree species have narrowed over time in similar environments [4].Species identification, delimitation, and description usually depend on morphological characters, but these traits often fail to distinguish the recently diverged species in tropical and subtropical forests, leading to a long and controversial debate such as species complex [4,5].
Species complexes are a group of taxa consisting of multiple species-level lineages that cannot be reliably separated using ordinary knowledge [6].Resolution is often hindered by their cryptic nature, making it difficult to distinguish them using traditional methods like external morphology [4,6].To study species complexes, a variety of methods were continuously improved, which might involve analyzing differences in individual traits, conducting reproductive isolation tests, and utilizing DNA-based techniques like molecular phylogenetics [7].These approaches help researchers determine the boundaries between closely related organisms within a species complex.By examining the genetic, morphological, and ecological characteristics of these organisms, it is possible to identify cryptic species, hidden sibling species, and other components of a species complex [8,9].Recent researches have revealed that the species in difficult lineages such as bamboos, palms, oaks, rosids, and camellias [10][11][12][13][14], believed to be nominal species actually representing a group of closely related species, are sometimes morphologically indistinguishable.
In plants, mitochondrion and chloroplast are two DNA-containing organelles.Both have low rates of nucleotide substitution, significant variation in genome sizes, and abundant repetitive sequences [15].Recombination is crucial for DNA replication in all organisms [16].Mitochondrial homologous recombination essentially refers to reversible, frequent exchange of large repeats, which, if not harmful to mitochondrial function, could be retained, leading to an overall increase in mitogenome size [17].Assembly of the mitogenomes is challenging due to their large repetitive sequences and multipartite structures.Second-together with third-generation sequencing methods help in assembling and discovering these structures [18].
Recent advances in sequencing technologies have greatly improved the acquisition of large amounts of genomic data, making it ideal for phylogenetic analysis.Plastomes, the complete DNA sequences of chloroplast, are widely utilized in phylogenetic studies in the family Lauraceae due to their ease of sequencing, assembly, and annotation [19][20][21].The inclusion of the mitogenome in phylogenetic analysis has been increasingly applied in the angiosperm [22][23][24].It is the diversity of genomic data that has brought the discordance of organelle and nuclear signalling into focus [25,26].Cytonuclear discordance refers to the incongruence between the evolutionary histories of nuclear and cytoplasmic genomes within a species or a group of species.The discordance always refers to hybridization, incomplete lineage sorting, or horizontal gene transfer [27].Recent studies have indeed highlighted the prevalence of cytonuclear discordance in various plant species [28][29][30].
Caryodaphnopsis has no reliable fossil record, but two molecular analyses have dated the separation between species in Asia and America to the middle Eocene (44 or 48 million years ago) [44,45].Both geographical groups were supported as monophyletic by previous phylogenetic analyses.The first reported chloroplast marker in Caryodaphnopsis was matK, which was used for phylogenetic analysis within the Lauraceae and suggested that C. tonkinensis formed a weakly supported monophyletic clade [46].After that, Chanderbali et [45], which comprised species from two geographical groups, was strongly supported.In those phylogenetic analyses, the relationships among five South American species were well resolved, i.e., an unidentified Caryodaphnopsis species and its sister group containing C. burger, C. fosteri, and C. inaequalis, followed by C. cogolloi [41].However, the relationships among species in Asia have not been resolved due to low sequence divergence in the ITS and RPB2 markers.The separation of three individuals of C. tonkinensis into two branches may represent sample misidentifications or indicate intraspecific diversity [45].
In this study, we completed the assembly and annotation of the mitogenome of C. henryi.Data from 21 Caryodaphnopsis individuals in Asia were collected.The goals of this study were to (1) determine the first complete mitogenome in the Lauraceae family; (2) reveal the genomic characteristics and structural features of C. henryi; and (3) reconstruct the nuclear, chloroplast, and mitochondrial phylogenies of the Caryodaphnopsis species in Asia.

Plant material and geographic distributions
Fresh leaves and silica-gel dried materials were collected from ten Caryodaphnopsis species from China and Vietnam.Distribution data was compiled using herbarium records, and the voucher specimens were deposited in the Herbarium of Guangxi Normal University (Table 1).Figure 1 depicts the fruits of six Caryodaphnopsis species.In addition, the plastome sequence of C. henryi was deposited in the Lauraceae Chloroplast Genome Database (LCGDB, LAU00015, https:// lcgdb.wordp ress.com) [20].And the complete mitogenome sequence of C. henryi was deposited in NCBI (OR987149).

DNA extraction and sequencing
High-quality genomic DNA of ten Caryodaphnopsis leaves were delivered to Tianjin Novogene Company for Illumina library preparation and secondgeneration sequencings.Genomic DNA was isolated from 2 g of fresh or silica-dried leaves using the CTAB technique using 4% CTAB [49], 1% PVP, and 0.2% DL dithiothreitol.The cleaved DNA fragments were tilized to build 500 bp short-insert libraries, according to the manufacturer's handbook (Illumina).Each DNA sample received > 4.0 Gb of data from a Genome Analyzer (Illumina HiSeq 2500) at BGI-Shenzhen after being indexed by tags and pooled in one lane.A total of 27.9 Gb of sequence reads with a length of 150 bp were obtained for C. henyri using second-generation sequencing.Young leaves from C. henryi was extracted and sequenced using Oxford Nanopore PromethION platforms for third-generation sequencings.High-quality genomic DNA was extracted from the leaves using the SDS method.After library construction using SQK-LSK109 (Oxford Nanopore Technology), DNA sequencing was performed using Oxford Nanopore sequencing based on the promethION platform and 20.7 Gb of raw data with an average reads size of 27,600 bp were produced.

Genome assembly and annotation
The unlooped mitogenome, complete plastome, and nrDNA sequences for the Caryodaphnopsis samples were assembled using GetOrganelle 1.7.5 [50].To assemble a complete mitogenome of Caryodaphnopsis henryi, the Illumina sequencing data of C. henryi were initially assembled using GetOrganelle [50].After obtaining the Nanopore third-generation sequencing reads of C. henryi, the adaptors were first trimmed using Porechop, and then, by aligning the trimmed reads to the scaffolds assembled by GetOrganelle using BLAST + with the parameter -evalue 1e-200 [51], the subset of long sequences that was similar to the mitochondria was obtained.Finally, these long reads and the mitochondriarelated short reads that were extended by GetOrganelle were used together for hybrid assembly, which was performed by the Unicycler pipeline [52].In the assembly result of C. henryi, two putative mitochondrial sequences were obtained, including a linear sequence of length 968,798 bp, and a circular sequence of length 199,231 bp.Mitogenomes were annotated using GeSeq [53], with Liriodendron tulipifera (KC821969) and Magnolia biondii (MN206019) as references.Subsequently, a detailed annotation was performed with references in Geneious Prime [54].The circular mitogenome map was visualized using OGDRAW [55].

Phylogenectic analysis
All the sequence matrices were aligned with MAFFT program (version 7.31) [59] and manually modified with Geneious (version 9.1.7)[54].Three datasets of mitochondrial, chloroplast, and nuclear ribosomal cistron sequences were comprised of the following cases: the mitochondrial dataset had 41 protein-coding genes, nine intron sequences, and ten intergenic regions; the chloroplast dataset used complete chloroplast genome sequences; and the nrDNA dataset was ETS-18S-ITS1-5.8S-ITS2-26S.A maximum likelihood (ML) analysis was carried out with IQ-TREE (version 2.1.2) [60] using 1000 ultrafast bootstrap replicates.The DNA substitution models were chosen as TVM+I+G (mtDNA), GTR+I+G (cpDNA), and GTR+G (nrDNA).The bayesian inference (BI) analysis based on the GTR+F+I (mtDNA), GTR+F+I+G4 (cpDNA), and GTR+F+I (nrDNA) models was performed with MrBayes (version 3.2.7)[61].The BI analysis started with a random tree and sampled every 1000 generations.The first 20% of the trees was discarded as burn-in, and the remaining trees were used to generate a majority-rule consensus tree [62].Visualizing and editing phylogenetic trees were performed with FigTree software (version 1.4.0).

Organelle genome features
The DNA of C. henryi was extracted and sequenced using the Illumina HiSeq 2500 and Oxford Nanopore Pro-methION platforms for second-and third-generation sequencing, respectively.A total of 27.9 Gb raw reads of 150 bp in length and about 20.7 Gb Nanopore long read data with an average read size of 27,600 bp were used for genome assembly.We successfully assembled the whole mitogenome and chloroplast of C. henryi by using Illumina short reads and Nanopore long reads, which consists of one big linear contig and two tiny circular contigs with lengths of 968,798 bp, 199,231 bp, and 154,938 bp, respectively.With a total length of 1,168,029 bp, the overall base composition of the entire mitogenome is as follows: A: 26.7%, T: 26.5%, G: 23.4%, C: 23.4%, and G + C content is 46.8%.The positions of all the genes identified in the C. henryi mitogenome and the functional categorization of these genes are presented (Fig. 2A).The mitogenome contains 65 unique genes, including 41 protein-coding genes (PCGs), 21 transfer RNA (tRNA) genes, and 3 ribosomal RNA (rRNA) genes (Table 2).The chloroplast genome, with a length of 154,938 bp (39% G + C content), contains 113 unique genes, including 79 protein-coding genes, 30 tRNA genes, and 4 rRNA genes (Fig. 2B).
The C. henryi mitogenome sequence was approximately 7.5 times longer than its chloroplast genome.Between the mitogenome and plastome we found a total of 50 homologous DNA fragments (Table S1, Fig. 4).The length of fragments ranged from 39 to 5262 bp.The total insert fragments were 23,583 bp in length, accounting for 2.02% of the length of mitogenome.Six tRNA genes were located in these fragments (trnH-GUG, trnM-CAU, trnN-GUU, trnV-GAC, trnW-CCA, trnP-UGG).We also detected that the fragments of chloroplast genes, such as rrnS and trnD-GUC, were located in the mitogenome.

Phylogeny of mitochondrial sequences
With the reference mitogenome of C. henryi, we further assembled 60 mitochondrial regions, including 41 mitochondrial protein-coding gene sequences, nine intron sequences, and ten intergenic region sequences for 21 individuals of ten Caryodaphnopsis species in Asia.The mitochondrial matrix (Table 3) based on the 60 regions comprises 166,057 characters, and 363 of which (depending on the consensus threshold) are parsimony-informative characters (PICs).The mitochondrial matrix was used to reconstruct phylogenetic trees, with two Neocinnamomum species serving as outgroups (Fig. 6A).

Phylogeny of plastome sequences
The

Phylogeny of nuclear ribosomal cistron sequences
The nrDNA sequence of 21 individuals from ten Caryodaphnopsis species in Asia were newly determind in the study.The lengths of nrDNA sequences ranged from 6482 bp (C.bilocellata) to 6537 bp (C.metallica) (Table 5).Three rRNA genes and three transcribed spacers were found in these nrDNA sequences.For the 26S

General features of mitogenome
This study presents the complete mitogenome for woody plants in the family Lauraceae obtained by Illumina and Nanopore sequencing technologies (Fig. 2A).To date, there are now three orders and eight families whose mitogenomes have been sequenced within the magnoliids.The mitogenome of Caryodaphnopsis henryi, with a length of 1,168,029 bp, is larger than both mitogenomes of Hernandia nymphaeifolia and Magnolia biondii [63].Table 3 The 60 mitochondrial segments were used to reconstruct the phylogenetic relationships
The size variation of mitogenomes in land plants can be influenced by a variety of factors, including retrotransposon proliferation, the generation of repetitive DNA through homologous recombination, the incorporation of foreign sequences via intracellular transfer from the chloroplast or nuclear genome, or horizontal transfer of mitochondrial DNA [63,64].This variety has been reported in many plant species, with mitogenome sizes ranging from 66 kb in Viscum scurruloideum [65] to as large as 11 Mb in Silene conica [66].However, in different species, the increase in mitogenome size could be caused by different factors [67].
On the one hand, a total of 2233 repeats and 368 SSRs were identified in the mitogenome of Caryodaphnopsis henryi (Fig. 3B).The mitogenome exhibited a significant number of dispersed repeats, primarily consisting of tandem, forward, and palindromic repeats (Fig. 3A).These repeats are critical for the recombination of the mitogenome, as they are one of the causes of variation affecting the size and structure of the mitogenome [65].The presence of repeated sequences in the mitogenome can increase the possibility of recombination, leading to variations in the genome structure, which in turn could relate to gene expression and function [66].
On the other hand, the structure and evolutionary process of plant mitogenome make it more prone to accepting and integrating foreign DNA [64].Horizontal gene transfer from chloroplasts to mitochondria has been reported multiple times, but the length and number of transfer fragments vary significantly between species.In this study, we found 50 homologous DNA fragments in Caryodaphnopsis henryi (Fig. 4), transferred from the chloroplast genome to the mitogenome.Thus, the length variation of intergenic regions might contribute to the length difference of the mitogenome in magnoliids lineages, primarily due to frequent recombination of repeated sequences and integration of foreign ones during evolution [67].
Genome rearrangement events, such as gene order changes, can reflect evolutionary distance and niche adaptation between species [68].These events are responsible for creating extant species with conserved genes in different positions across genomes, and close species tend to have a similar set of genes or share most of them [69,70].Gene synteny analysis revealed a succession of rearrangements in the mitogenomes of Magnolia biondii, Hernandia nymphaeifolia, and Caryodaphnopsis henryi (Fig. 5).Only six gene clusters in The genome rearrangement events revealed by gene collinearity analysis can indeed reflect the evolutionary distance and niche adaptation between species [ 71,72].

Cytonuclear discordance
Cytonuclear discordance, which shows markedly different phylogenetic patterns between nuclear markers and cytoplasmic genes such as mitochondrial and chloroplast genes, has been observed in various plant populations and is often attributed to processes such as hybridization and incomplete lineage sorting [30,73,74].The species in the genus of Caryodaphnopsis, with different positions in the chloroplast and mitochondrial phylogenies relative to the nuclear phylogeny (Fig. 7).73], balsam poplars [30], the Australian plant genus Adenanthos [74].This inconsistency may reveal complex patterns of gene flow that these species may have experienced over the course of their evolution [75].In addition, four individuals of the C. tonkinensis species are clustered together in nuclear phylogeny but separated in different clades of the mitochondrial and chloroplast phylogenies.This separation of C. tonkinensis may represent intraspecific Fig. 8 The geographic distribution and the fruit feature of Caryodaphnopsis are represented in the phylogenetic tree based on their mitochondrial, chloroplast, and nrDNA phylogenetic trees diversity, indicating that Caryodaphnopsis may have a species complex.
Mitochondrial and chloroplast capture in plants is a phenomenon that occurs when a plant species acquires these organelles from another species through hybridization [76,77].In our study, the species of Caryodaphnopsis, with different positions in the nuclear phylogeny, are grouped together in the mitochondrial and chloroplast phylogenies.The individuals of C. latifolia and C. bilocellata collected from Hekou, China.Based on the mtDNA and cpDNA data, our phylogenomic analysis shows sisterhood of the species of C. latifolia and the species of C. bilocellata.Our phylogeny appears to reflect the organellar capture from another species.Maybe the overlap of their geographic distributions results in mitochondrial and chloroplast capture.The overlapping geographic (Fig. 8) distribution of species can lead to gene flow and hybridization, potentially resulting in gene transfer between mitochondria and chloroplasts, which can affect their clustering on the phylogenetic tree [78,79].
Distribution sites of Caryodaphnopsis species shows an interesting pattern of genetic diversity across regions with comparable species richness.Malipo county of China has only two Caryodaphnopsis species, which share similar chloroplast and mitochondrial genomes (Fig. 9).In contrast, Hekou county of China has six Caryodaphnopsis species with multiple types of chloroplast, mitochondrial, and nrDNA sequences.The genetic diversity within and between species could trace implications for evolutionary processes, including adaptation to environmental stress, natural selection, and disease susceptibility [80].The high genetic diversity observed in Hekou suggests that it may be a center of Caryodaphnopsis species distribution in Asia and a source of genetic resources for future conservation and breeding efforts.The phenomenon of higher genetic diversity in areas with greater species richness has been observed in other plant groups, such as tropical rainforests and alpine regions [81][82][83].

Conclusions
We assembled the complete mitogenome sequence of Caryodaphnopsis henryi, a tropical tree in the family Lauraceae.The whole mitogenome of C. henryi consists of one big linear contig, with length of 968,798 bp, and one tiny circular contig, with length of 199,231 bp.The mitogenome contains 65 genes, including 41 proteincoding genes, 21 tRNA genes, and three rRNA genes.There are 50 homologous DNA fragments between the mitogenome and plastome of C. henryi.Comparative genomic analysis indicated that the sizes and gene orders of the three sequenced mitogenomes of C. henryi, Magnolia biondii, and Hernandia nymphaeifolia differed greatly.We found significant incongruence between the mitochondrial and nuclear or chloroplast phylogenies in a Caryodaphnopsis group.The study also revealed that Caryodaphnopsis species with sympatry often cluster together in the chloroplast and mitochondrial phylogenetic trees.

Fig. 2
Fig.2Gene maps of the Caryodaphnopsis henryi mitogenome (A) and chloroplast genome (B).The annotation of the genomes was performed using GeSeq.The genes that are drawn outside of the circle are transcribed clockwise, whereas those that are drawn inside the circle are transcribed counter clockwise The 22 Caryodaphnopsis individuals were divided into four distinct groups.The group I only included one C. burger individual (ML-BS = 100%, BI-PP = 1.00).The group II included two C. henryi individuals (ML-BS = 95%, BI-PP = 1.00).The group III included individuals of C. bilocellata, C. latifolia, and a suspected new species C. sp. 2 (ML-BS = 97%, BI-PP = 1.00).And the group IV included individuals of C. laotica, C. malipoensis, C. metallica, C. tonkinensis, and two suspected new species C. sp. 1 and C. sp. 3 (ML-BS = 97%, BI-PP = 1.00).

Fig. 4
Fig. 4 Homological sequences between mitogenome and plastome of C. henryi.The blue circular segment represents the mitogenome, the green circular segment represents the plastome, and the line represents the homologous fragment.Different colors in the inner circle represent gene density

Fig. 5
Fig. 5 Gene order in the mitogenomes of Hernandia nymphaeifolia, Magnolia biondii, and Caryodaphnopsis henryi.H. nymphaeifolia mitochondrial genes are shown on the left, C. henryi mitochondrial genes in the middle, and M. biondii mitochondrial genes on the right, with different colors signifying the relevant collinear sections

Fig. 6
Fig. 6 Molecular phylogenetic trees of eleven species of Caryodaphnopsis based on mitochondrial (A), complete plastomes (B), and nrDNA (C) sequences using unpartitioned Bayesian inference (BI) and maximum likelihood (ML).The trees were rooted with thesequences of Neocinnamomum fargesii and N. lecomtei.Numbers associated with the branches are ML bootstrap values (BS) and BI posterior probabilities (PP)

Fig. 7
Fig. 7 Phylogenies obtained from the three different datasets: a mitochondrial; b nuclear ribosomal; c chloroplast

Fig. 9
Fig. 9 Distribution of Caryodaphnopsis species in this study.Each site of the species is represented by a square point.The color of the square corresponds to the species' grouping with nrDNA data.The circle is divided into three parts, representing the groupings with nrDNA, mtDNA, and cpDNA data respectively.Same color indicates the species within the consistent group of the phylogenetic topologies.The world map was downloaded from the website of the Resource and Environment Science and Data Center (http:// www.resdc.cn)

Table 1
Sampled species of Caryodaphnopsis and their voucher specimens in this study

Table 2
Genes, separated by category, encoded by Caryodaphnopsis henryi mitogenomeA single asterisk (*) preceding gene names indicate intron-containing genes The length of mitochondrial genes is similar among C. henryi, M. biondii, and H. nymphaeifolia.A total of 65 mitochondrial genes in C. henryi, with a total length of 41 ,938 bp, is 800 bp smaller than those of M. biondii and 49 bp larger than those of H. nymphaeifolia.The mitochondrial intronic and intergenic regions of C. henryi, with a total length of 1,126,191 bp, are 201,829 bp larger than those of M. biondii and 632,275 bp larger than those

Table 4
Summary of ten complete plastomes of Caryodaphnopsis

Table 5
Summary of ten complete nrDNAs of Caryodaphnopsis